<% component = metadata.transforms.regex_parser %>

<%= component_header(component) %>

## Config File

<%= component_config_example(component) %>

## Options

<%= options_table(component.options.to_h.values.sort) %>

## Examples

Given the following log line:

{% code-tabs %}
{% code-tabs-item title="log" %}
```json
{
  "message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}
```
{% endcode-tabs-item %}
{% endcode-tabs %}

And the following configuration:

{% code-tabs %}
{% code-tabs-item title="vector.toml" %}
```toml
[transforms.<transform-id>]
  type = "regex_parser"
  field = "message"
  regex = '^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$'

[transforms.<transform-id>.types]
  bytes_int = "int"
  timestamp = "timestamp|%m/%d/%Y:%H:%M:%S %z"
  status = "int"
  bytes_out = "int"
```
{% endcode-tabs-item %}
{% endcode-tabs %}

A [`log` event][docs.log_event] will be emitted with the following structure:

```javascript
{
  // ... existing fields
  "bytes_in": 5667,
  "host": "5.86.210.12",
  "user_id": "zieme4647",
  "timestamp": <19/06/2019:17:20:49 -0400>,
  "message": "GET /embrace/supply-chains/dynamic/vertical",
  "status": 201,
  "bytes": 20574
}
```

Things to note about the output:

1. The `message` field was overwritten.
2. The `bytes_in`, `timestamp`, `status`, and `bytes_out` fields were coerced.


## How It Works [[sort]]

<%= component_sections(component) %>

### Failed Parsing

If the `field` value fails to parse against the provided `regex` then an error
will be [logged][docs.monitoring_logs] and the event will be kept or discarded
depending on the `drop_failed` value.

A failure includes any event that does not successfully parse against the
provided `regex`. This includes bad values as well as events missing the
specified `field`.

### Performance

The `regex_parser` source has been involved in the following performance tests:

* [`regex_parsing_performance`][url.regex_parsing_performance_test]

Learn more in the [Performance][docs.performance] sections.

### Regex Debugger

To test the validity of the `regex` option, we recommend the [Golang Regex
Tester][url.regex_tester] as it's Regex syntax closely 
follows Rust's.

### Regex Syntax

Vector follows the [documented Rust Regex syntax][url.rust_regex_syntax] since
Vector is written in Rust. This syntax follows a Perl-style regular expression
syntax, but lacks a few features like look around and backreferences.

#### Named Captures

You can name Regex captures with the `<name>` syntax. For example:

```
^(?P<timestamp>.*) (?P<level>\w*) (?P<message>.*)$
```

Will capture `timestamp`, `level`, and `message`. All values are extracted as
`string` values and must be coerced with the `types` table.

More info can be found in the [Regex grouping and flags
documentation][url.regex_grouping_and_flags].

#### Flags

Regex flags can be toggled with the `(?flags)` syntax. The available flags are:

| Flag | Descriuption |
| :--- | :----------- |
| `i`  | case-insensitive: letters match both upper and lower case |
| `m`  | multi-line mode: ^ and $ match begin/end of line |
| `s`  | allow . to match `\n` |
| `U`  | swap the meaning of `x*` and `x*?` |
| `u`  | Unicode support (enabled by default) |
| `x`  | ignore whitespace and allow line comments (starting with `#`)

For example, to enable the case-insensitive flag you can write:

```
(?i)Hello world
```

More info can be found in the [Regex grouping and flags
documentation][url.regex_grouping_and_flags].


## Troubleshooting

<%= component_troubleshooting(component) %>

## Resources

<%= component_resources(component) %>